{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Learning how to move a human arm\n",
    "\n",
    "In this tutorial we will show how to train a basic biomechanical model using `keras-rl`.\n",
    "\n",
    "## Installation\n",
    "\n",
    "To make it work, follow the instructions in\n",
    "https://github.com/stanfordnmbl/osim-rl#getting-started\n",
    "i.e. run\n",
    "\n",
    "    conda create -n opensim-rl -c kidzik opensim python=3.6.1\n",
    "    activate opensim-rl\n",
    "    pip install git+https://github.com/stanfordnmbl/osim-rl.git\n",
    "\n",
    "Then run\n",
    "\n",
    "    pip install keras tensorflow keras-rl jupyter\n",
    "    git clone https://github.com/stanfordnmbl/osim-rl.git\n",
    "    cd osim-rl\n",
    "    \n",
    "follow the instructions and once jupyter is installed and type\n",
    "\n",
    "    jupyter notebook\n",
    "\n",
    "This should open the browser with jupyter. Navigate to this notebook, i.e. to the file `examples/train.arm.ipynb`.\n",
    "\n",
    "## Preparing the environment\n",
    "\n",
    "The following two blocks load necessary libraries and create a simulator environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import osim\n",
    "import numpy as np\n",
    "import sys\n",
    "\n",
    "# Keras libraries \n",
    "from keras.optimizers import Adam\n",
    "\n",
    "import numpy as np\n",
    "from helpers import *\n",
    "\n",
    "from rl.agents import DDPGAgent\n",
    "from rl.memory import SequentialMemory\n",
    "from rl.random import OrnsteinUhlenbeckProcess\n",
    "\n",
    "from keras.optimizers import RMSprop\n",
    "\n",
    "import argparse\n",
    "import math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load arm environment\n",
    "from osim.env import Arm2DEnv\n",
    "env = Arm2DEnv(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating the actor and the critic\n",
    "\n",
    "The actor serves as a brain for controlling muscles. The critic is our approximation of how good is the brain performing for achieving the goal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create networks for DDPG\n",
    "# Next, we build a very simple model.\n",
    "actor = policy_nn(env.observation_space.shape[0], env.action_space.shape[0], hidden_layers = 3, hidden_size = 32)\n",
    "print(actor.summary())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "qfunc = q_nn(env.observation_space.shape[0], env.action_space.shape[0], hidden_layers = 3, hidden_size = 64)\n",
    "print(qfunc[0].summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train the actor and the critic\n",
    "\n",
    "We will now run `keras-rl` implementation of the DDPG algorithm which trains both networks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up the agent for training\n",
    "memory = SequentialMemory(limit=100000, window_length=1)\n",
    "random_process = OrnsteinUhlenbeckProcess(theta=.15, mu=0., sigma=.2, size=env.action_space.shape)\n",
    "agent = DDPGAgent(nb_actions=env.action_space.shape[0], actor=actor, critic=qfunc[0], critic_action_input=qfunc[1],\n",
    "                  memory=memory, nb_steps_warmup_critic=100, nb_steps_warmup_actor=100,\n",
    "                  random_process=random_process, gamma=.99, target_model_update=1e-3,\n",
    "                  delta_clip=1.)\n",
    "agent.compile(Adam(lr=.001, clipnorm=1.), metrics=['mae'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Okay, now it's time to learn something! We visualize the training here for show, but this\n",
    "# slows down training quite a lot. You can always safely abort the training prematurely by\n",
    "# stopping the notebook\n",
    "agent.fit(env, nb_steps=2000, visualize=False, verbose=0, nb_max_episode_steps=200, log_interval=10000)\n",
    "# After training is done, we save the final weights.\n",
    "# agent.save_weights(args.model, overwrite=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate the results\n",
    "Check how our trained 'brain' performs. Below we will also load a pretrained model (on the larger number of episodes), which should perform better. It was trained exactly the same way, just with a larger number of steps (parameter `nb_steps` in `agent.fit`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# agent.load_weights(args.model)\n",
    "# Finally, evaluate our algorithm for 2 episodes.\n",
    "agent.test(env, nb_episodes=2, visualize=False, nb_max_episode_steps=1000)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
